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(T) Abstract 

Random walk based distributed algorithms make use of a token that circulates in the system 
according to a random walk scheme to achieve their goal. To study their efficiency and compare it to 
r \ one of the deterministic solutions, one is led to compute certain quantities, namely the hitting times 

^^ and the cover time. Until now, only bounds on these quantities were defined. 

First, this paper presents two generalizations of the notions of hitting and cover times to weighted 

graphs. Indeed, the properties of random walks on symmetrically weighted graphs provide interesting 

O results on random walk based distributed algorithms, such as local load balancing. Both of these 

generalization are proposed to precisely represent the behaviour of these algorithms, and to take into 

^_^ account what the weights represent. 

^. Then, we propose an algorithm to compute the n 2 hitting times on a weighted graph of n vertices, 

which we improve to obtain a 0(n 3 ) complexity. This complexity is the lowest up to now. This 

algorithm computes both of the generalizations that we propose for the hitting times on a weighted 

^O graph. 

Finally, we provide the first algorithm to compute the cover time (in both senses) of a graph. We 
r~ — , improve it to achieve a complexity of 0(n 3 2 n ). The algorithms that we present are all robust to a 

topological change in a limited number of edges. This property allows us to use them on dynamic 
00 graphs. 

o 
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1 Introduction 



The constant evolution of networking makes it possible today to use several computers at a time to 
carry out a given computation. A distributed system is defined as a set of interconnected computing 
devices called sites or nodes, cooperating in order to achieve a computation. A distributed system is 
usually modeled by a finite undirected graph G(V, E), where V is the set of sites and E is the set of 
communication links (be cither physical or logical). 

This paper focuses on random walk based distributed algorithms. These algorithms are token-based 
algorithms - the token circulation mechanism is a well-known paradigm to achieve a global task in 
distributed computing. These algorithms have been designed to remove the strong hypotheses on the 
topology required by a deterministic token circulation scheme. The token message circulates in the 
system, and at each step, the site that owns the token sends it to one of its neighbors chosen at random. 

The low message complexity makes token-based algorithms interesting, in comparison with the flood- 
ing algorithms, the main interest is their low time complexity. On many particular topologies, a de- 
terministic token circulation scheme can be designed to efficiently visit all the sites in the network: on 



a ring, the token can turn clockwise; on a chain, it can turn back and forth; on a tree, a depth first 
search provides positive results; and on a complete graph, the token can visit the sites according any 
strict order. However, these schemes suffer of a lack of adptability because they are designed for one 
particular topology and cannot be easily adapted to fit other ones. On the other hand, random walk 
based distributed algorithms can function on any topology, they require only a local knowledge of the 
topology (except for the standard assumption that the network remains connected) . Random walks offer 
an interesting property to adapt to the insertion or deletion of sites or links in the network without 
modifying any of the code (as long as that the network remains connected; otherwise, no communication 
is possible between the connected components and the only solution is to launch one algorithm in each 
component). With the increasing dynamicity of networks, this feature is becoming crucial: redesigning a 
new browsing scheme at each modification of the topology is impossible, and flooding-based procedures 
lead to the congestion of many networks. 

The token circulation paradigm has been widely studied in the deterministic case. Original solutions 
using random walks have been designed to solve various problems related to distributed computing e.g 
[13] for self-stabilizing mutual exclusion, [Ej for mobile agent in wireless networks, [3] for token circulation 
in a dynamic and faulty environment or as an alternative to flooding in decentralized and unstructured 
peer-to-peer networks (especially to achieve low bandwidth consumption by control messages) |17j . 

The time complexity of random walk based algorithms, like the one of deterministic token based 
algorithms, is the number of "steps" the algorithm takes to achieve the network traversal. Considering 
only one walk at a time (which is the case we are dealing with) , it is also equal to the message complexity. 

Random walk-based distributed algorithms must be then analyzed through probabilistic tools. It can 
be shown that a random walk will visit all the sites in a graph in a finite time, but there is no hope to 
give hard bounds to the time it will take: the classical worst-case analysis cannot be applied in this case, 
so we are led to use an average-case analysis. 

The cover time C: the average time to visit all nodes in the system, and the hitting time hij, the 
average time it takes to reach a node j for the first time starting from a given node i, are the two first 
important values that arise in the analysis of random walk-based distributed algorithms. For instance, 
the cover time is the average time required to build a spanning tree thanks to the random walk token 
circulation algorithm [T] and the hitting time is the average time it takes to enter a critical section |13j . 

Related Works The mathematical background can be found in [16] . 

Many bounds on hitting times and cover times are available. |5J proves that: 



[IT] and [TU] show that: 



hij < — n 3 n 2 + O(n) 

3 ~ 27 9 v ' 



("I +o(l))nkm < C < ( ho(l) ) n 3 



In |12j . the authors provide a polylogarithmic approximation for the cover time with a polynomial com- 
plexity. But the approximation can lead to severe biases, because they obtain a result of (n — l) 2 Inn in 
place of (n — l) 2 on the path. [J¥J establishes that: 

-M <C< 10 5 M(lnlnn) 2 

with M = maxJKgln |5'|/S' C V}, k$ = max{Kjj/j, j e S} where Kij — hij + hji is the commute time. 
The ks provides a polynomial approximation within a factor 2 of M (a naive computation of M would 
have an exponential complexity since l'P(V)] = 2' v '). Even if this is a theoretically good approximation 
(0((lnlnn) 2 ) means a slow divergence), the actual ratio is 8 x 10 5 x (lnlnn) 2 , and the factor 8 x 10 5 is 
very high with respect to (lnlnn) 2 in most concrete applications, especially in distributed computing. 



Contributions All of these results, but the last one, do not take into account the topology of the 
graph, except for its size. Now, the topology explains the difference between the behavior of the walk on 
the various topologies, the "good" cover time on the complete graph, and the "bad" one on the lollipop 
graph (wrt their sizes). In order to design insertion schemes and topologies in which the use of random 
walks is efficient, we provide algorithms to compute the exact values of hitting and cover times in a graph. 
First, we extend the notion of these two relevant values into a weighted graph, which illustrates a more 
general representation of distributed systems. The weight represents the quantity that hicrarchizes the 
neighbors, and makes the token visit a neighbor rather than one another. It can be used to represent 
the bandwidth of a link, which will locally balance the load on the links. Weights also appear where 
studying the average time it takes to reach a set of sites. For example, in file-sharing protocols, resources 
are generally replicated on several sites. To obtain the average time to first hit a site in the set O of all 
owners of the resource, we consider the graph built from G by removing O and adding a single site o. 
The weight of the link between o and a site i is the sum of the weights of all links between i and a site 
of O. Then, the hitting time from i to o is the quantity we were searching for. 

Then we propose an original algorithm to efficiently compute the hitting times. This algorithm 
provides a tool to compare the efficiency of random walk based distributed algorithms on large distributed 
networks. Finally, we propose a method to exactly compute the cover time. As far as we know, this is 
the first solution ever designed to solve this problem. Our method provides information better than the 
previously known bounds presented above. Indeed, the result not only takes topological informations into 
account but it is also fairly robust (insertion or deletion of a limited number of communications links do 
not alter the cover time meaningfully). 

Outline of the paper The first part of this article illustrates previous results on hitting and cover 
times. The second part presents a new efficient method to compute the hitting times between all pair 
of nodes in a graph with one matrix inversion. It requires some preliminary work to generalize previous 
results on hitting times. The third part presents the first algorithm to exactly compute the cover time 
of a graph, based on the hitting times computation presented before. Then, we conclude by offering 
some new perspectives. In the sequel, we recall some results demonstrated by Chandra, Raghavan et al., 
and Tetali. We then derive some more general results, that we exploit to find an efficient algorithm to 
compute the hitting times in a graph. Finally, we offer an exemple of the execution of this algorithm. 

2 Preliminaries 

Random walks have been the subject of a wide applied mathematics litterature. Random walks are 
Markov chains, i.e. memoryless stochastic process: if (X n ) ne ^ is a Markov chain (X n is the site that 
owns the token at time n) , Vn € N: 

P[X n+ i = a\X n = a n ,X n -i = a n -\,. . . , X = a ] = P[X n+1 = a\X n = a n ] 

In this paper, the distributed system topology is represented by a dynamic, connected, undirected, 
positively real-weighted graph. We denoted by Afi the set of neighbors of node i (the set of nodes to 
which i is connected). 

The weight function will be denoted by 10. For each edge (i,j) a numerical value u(i,j) is defined. 
Here the weight assignment is symmetric i.e. u)(i,j) — u)(j,i). Consider the site i, we denote, uj{i) = 
^2jeN- w (*'i) an d w{G) = ^Zu i -i £E u)(i,j). The graph is defined to be unweighted if no weight assignment 
is assumed. 

A random walk is then a sequence of nodes of G visited by a token that starts at a node i and visits 
other vertices according to the following transition rule: if a token is at i at time t then, at time t+1, it 
will be at one of the neighbors of i chosen at random among all of them proportionally to the weight of 
the link adjacent to i. 




Figure 1: the electrical circuit built from a weighted graph 



A random walk on a weighted graph is such that, being on i, it moves from i to j at the next step 

with the probability: 

pry ;| V -i uihj) 

P[X n+1 = j\X n = i\ = — r— 

Ul{l) 

If the graph is unweighted, u(i,j) = 1 if i and j are neighbours, else. The probability that the walk 
will eventually hit a given vertex is 1: starting on any site, the token will eventually hit a given site, even 
if it takes a long time (actually, no hard bound exists on this time) . 

According to the interpretation given to the weights on the links of the graph, the characteristic values 
presented above can each be given two definitions. 

Definition 1 We call hitting time hij in the first sense (respectively cover time in the first sense,) the 
average number of edges visited by the random walk starting at site i to visit for the first time a site j 
(respectively to cover the graph starting at site i). 

Definition 2 We call hitting time hij in the second sense (respectively cover time in the second sense,) 
the average total weight of edges visited by the random walk starting at site i to visit for the first time a 
site j (respectively to cover the graph starting at site i). 

The commute time is denoted by re,-j = hij + hji. 

Most of the previous results deal with the definitions in the first sense. In the next section, we extend 
the results to the second sense definition. 



Random walks and resistances A tight link exists between random walks and electrical networks 
0. We build an electrical network from a graph G by replacing each of its edges by a resistor. The 
conductance value (i.e. the inverse of the resistance) is equal to the weight of the edge in the graph it 
replaces (see figurdTJ. 

Let r(i,j) (resp. c(i,j)) denote the resistance (resp. conductance) of the resistor between two adjacent 
nodes i and j. The equivalent resistance Rij (resistance of the electrical network) between i and j is 
defined as the resistance of the resistor to be placed between i and j to ensure the same electrical 
properties as the whole circuit. R denotes the maximal equivalent resistance between two nodes of the 
network i.e. R = max^jjgp R^. 



Previous results for unweighted graph A tight relationship between resistances in electric networks 
and random walks characteristic values as the hitting times and the cover time has been established [7J. 
In particular, it has been shown that, for random walks on unweighted graphs 



Lemma 1 



fVg j — ^iTTiJrL'ij 



where i and j denote two distinct vertices and m, the number of edges. 
From this equation, we have: 

Lemma 2 

mR < C < O(mRlogn) 

In |18j . hitting times on unweighted graphs are expressed only in terms of resistances 

Lemma 3 

hij = mRij + - 2J deg(k) (R jk - R lk ) 
kev 

Resistances computation Thanks to the Millman's theorem, we are able to compute all the resis- 
tances, as we have shown in [5] 

Theorem 1 (Millman's theorem) Consider an electrical network, on any node i, the following rela- 
tion holds: 



that 



^j'SA/i r(i,j) 



Vi - V h L Vt - V h ^ ^ Vi - v jn = o 



where Ji , • • • , j n are the neighbors of i, Vj 1 , • • • , Vj n are the voltages of each of these nodes. 

3 Hitting times in the first and second sense 

3.1 Hitting times on weighted graphs 

In [2], we provide an automatic way to compute resistance on unweighted graphs. Thanks to our method, 
we can deduce from lemma [3] above, the value of the hitting time between two nodes on such a graph. In 
this section, in order to generalize such a method , we establish the relation between hitting times and 
equivalent resistances for weighted graph. 



Theorem 2 



hij = uj{G)R tJ + \Y. u ( k )( R Jk - Rik) (1) 



kea 



Proof The following reasoning is inspired by [18] . 



Let UV be the expected number of visits to k in a random walk from i to j. Then U*? — 0, and: 



uif= E v?p*= E u i iU ^§ 



Thus, 



Tjl-3 TT\ 



w (^ fc )-T77T= w ( fc )-TrV ( 2 ) 

ieiv(fe) 

On the other hand, according to Kirchoff's current laws, when a unit current flows from j to i, on 
each node k except i and j: (V£ J denotes the potential on node k when a unit current flows from i to j 
and V$ — Vi° — V^ 3 ; for the sake of legibility, we may not write the superscript when it is obvious that 
the current flows from i to j) 

E c lkVik = 
ieiv(fc) 

E cik{Vi-V h ) = 

leN(k) 



E c ikVi = E Clk ] Vk 

l<£N{k) \l£N(k) 



(3) 



E 

l<£N(k) \l£N(k) 

Y, u{l,k)V t = u(k)V k 

leN(k) 



From equations (pi and (pi), since there is a single steady state in an electrical circuit, we deduce 

1 - \ u ? 
fe J ~~ A w(fc)' 



that VfcjVfej = A^^y, with A a factor such that the intensity circulating between i and j is 1, i.e. 
^2keN(i) c kj^7J7k) = ■'■• ^ there were several solutions to (3 1 with the same potentials on nodes i and j, 
there would be several electrical steady states in this circuit. Now, YlkeNCi) Ck J^ZJ(k) ~ ^ J2keN(j) ^k u<k\ = 
A for '^2 keN ( i \ U^ il?\ is the average number of traversal toward j of a random walk from i to j, and 
this can only be 1. Thus, A = 1, and 

Tjij 

uj{k) 

Since hij is the expected time for the walk to go from i to j, hij is the sum of the average number of 
visits of each site in the graph in the walk from i to j. By linearity of the expectation, hij = X^eG ^l" '• 
The expected commute time /ty , which is the average time for a random walk to go from i to j and back 



is: 



■* = £tf?' + £ 


uf 


leG lea 




= 5>(Q(V b - 


Vu 


lea 




=v«x;«(o 




;eG 




= 2u){G)Rij 





(5) 



where i^y is the equivalent resistance of the network between z and j, i.e. the voltage between those two 
nodes when a unit current enters i and leaves j. Note that L^ 4 = —Vu, since the current goes from i to 
j instead of going from j to i. 

Thanks to Kirch off 's current law applied on j, and using V* k — (the potential is defined up to a 
constant, and we can assign an arbitrary value to one potential in the network; the resistance is defined 
by the difference between two potentials, this does not affect it), we have: 

V lk = 3 - V! 1 

3 Rf + Rf 

with Rf- , R % ? k and R l A the resistances of the resistors to be placed between i, j and k to ensure the 
same electrical properties as the original network: the resistances that give the potentials on any two of 
those three nodes allow us to obtain the third one like in the whole graph. As a result of applying this 
law on k, we obtain: 

Vl k = Jk 



then: 



yik 



' r%+ r T 



U jk 



r>ijk r/ijk , r>ijk r>ijk , rtijk nijk 
n ij n ik "T n ji n jk ~T n ki n kj 



Thus, this formula being symmetrical in j and k, V 1 — V k . So 

Rik = Vk 



= Vl k + Vj k - Vj k 

= vt + vt j - V j k 

= v; k + v;i 
= v; k + vH 

_ U T K 



u>{j) w{k) 



Thus, 

Rik + Rkj — Rij = 




Then, 



u)(i) w(j) 

Tjij 
O u k 



U* = l -uo{k){R l0 + R jk - Ri k ) 



and h tJ = J2 keG Uk 

□ 

3.2 An efficient method to compute resistances 

The basic method Our first solution (detailed in [2 ) to compute the equivalent resistance between 
two given nodes i and j consists in applying a IV potential value on node i and OV on node j. R^ is 
obtained by the ratio of the potential difference between Vi — Vj to the current circulating between these 
two nodes. The latter is established by the knowledge of the potentials at all the adjacent nodes of i or 
j, given by the application of the Millman Theorem [l] An equivalent resistance is then computed by 
one matrix inversion (complexity 0(n 3 )) but 2n equivalent resistance computations arc also necessary to 
obtain one hitting time by formula (fTl). 

We now propose an improved method. The basic idea comes from the observation that most of the 
matrices inverting in the previous method are similar. 

The improved method If we consider a 1A current injected in i and flowing out in j, the Millman 
system can be rewritten as following: 

Vfc G V\{i,j},J2 leN(k) c kl (V k -Vi) = 

HleN(i) c il( V i- V l) = l 

This can be written: 

AV = v 

with A the matrix built from the conductance matrix by letting the entry (k, k) be — X^gjvffcl c ki, and v 
the vector with all entry except the i-th one 1 and the j-th one -1. 

However, A is not invertible. Indeed, in this system, no potential is specified and the potential is 
defined up to a constant. Thus we build a matrix A2 by replacing one of the lines of A (for instance the 
first one) by the corresponding line of the unity matrix, to set a potential in the system. 




Figure 2: example 



Thus 



A 2 V = v 2 



where v 2 is obtained by replacing the first line (the same line index corresponding to the potential 
set) by an arbitrary value, 1 for instance. 

Now, A 2 is invertible, and A 2 l v is a solution to AV — v that provides the potentials on each node. 
The equivalent resistance between i and j is thus 



R, 



A^ftj) 



AT&j) 



A 2 1 (i,i) + A 2 1 (j,i). 



and consequently the hitting times can be deduced by formula (TTj> . 

The resistances between all pairs of nodes can thus be computed by inverting a single matrix, with a 
0(n 3 ) complexity, while the basic method required a 0{n A ) complexity. 

The solution to compute the hitting times presented in |15j is based on matrix computations and 
also has a 0(n 3 ) complexity. However, our method provides extra information that allow an efficient 
computation of the hitting times in the first sense and of the cover time, which the method in |15| does 
not. 

An increase in the conductance between two nodes in a circuit can only increase the global conductance 



of the circuit, and this by a factor of at most 1 - 



the commute time in a factor of at most 1 - 



Ahi) 
,(G) 



Ri 3 



Thus, adding a new edge (i,j) in graph can increase 
This remark is known as Rayleigh's shortcut principle. 



This principle shows that in a dense network, the adjonction or the removal of an edge does not 
modify much of the hitting time, namely, by at most 1 -\ — ^(6) . Thus, the computation of the hitting 
times of a graph provides results for a wide array of graphs deducing from the first one by removing or 
adding a few edges to it. 



3.3 Example 

For the graph on figure [2] the matrix A 2 is: 
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A 2 1 is: 
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56049 
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The resistance matrix is: 
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156817 
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391027 
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2173355 


136335 


58657 


627268 


313634 


313634 


2509072 


10036288 


627268 


313634 


84309 


37151 





16476 


388515 


2257179 


92633 


14353 


627268 


313634 


156817 


2509072 


10036288 


627268 


156817 


89297 


28355 


16476 





199467 


2246667 


125557 


26949 
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313634 


156817 


2509072 


10036288 


627268 


156817 


391027 


356259 


388515 


199467 
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2509072 


2509072 


2509072 


10036288 


2509072 


2509072 
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2246667 
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V 

and the hitting times matrix is: 



/ 






6740527 


4636113 
627268 


5079563 


6844259 


68203479 


1655067 


6683885 


627268 


627268 


627268 


5018144 


156817 


627268 


6122939 





1848439 


679447 


5987869 


91460059 


9504503 


3771965 


627268 


313634 


156817 


627268 


5018144 


627268 


313634 


5986821 


2832587 





2140579 


7480049 


101973699 


7735425 


1188563 


627268 


313634 


313634 


627268 


5018144 


627268 


156817 


6171859 


1106918 


2011373 





4373337 


100608923 


9680431 


3899619 


627268 


156817 


313634 


627268 


5018144 


627268 


313634 


10946183 


10468579 


9516347 


3819747 





98029553 


19278767 


16338531 


1254536 


1254536 


1254536 


1254536 


5018144 


1254536 


1254536 


8632011 


22730653 


20114289 


20465549 


14514859 





31874379 


30104657 


2509072 


2509072 


2509072 


2509072 


1254536 


2509072 


2509072 


1042971 


7673707 


3936333 


6139751 


8562167 


95101143 





3197993 


156817 


627268 


627268 


627268 


627268 


5018144 


627268 


5760001 


3618817 


619915 


2891529 


8616549 


103757699 


4722493 





627268 


313634 


156817 


313634 


627268 


5018144 


627268 
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3.4 Hitting times in the first sense 

Hitting times in the first sense can then be computed thanks to: (rik is the expected number of visit to 
k in a walk from i to j) 

hij = uj(i)Rij 2_^ Rki 
kev 
Indeed, the electrical potential is proportionnal to the average number of visits rii ( |16j) to a site i in 
a walk from the site of potential 1 to the site of potential ( [4 ) , since 

„ w(i) 



n, : 



E 






and the potential also solves this equation. All of these solutions being proportionnal, m = n{Vi if 1 is 



the site with potential 1. Now, the probability that the token hits before going back to 1 is ("L , as 
shown in |16j . The expected number of return to 1 before hitting is thus: 

+oo / _ , / ~- s \ k 



Ei au 



2uj{G) \ 2u>(G) 



,_ ) -j(l)«io/ w(l)«io 

trit^K w(i)«io/ w(i)«io 

hthA w(l)«io/ w(1)kio 
2cj(g) x 1 y? / 2cj(G) x ' 

oj(1)kio 



f=l 



_ o;(1)kio 
2w(G) 

Thus, n fe = n,^f^ (V» = 0, 1$ = 1) and, 



tin = H nk 

kev 

711 ^ Vi - Vi 

kev 3 l 

_ u)(i)Kij >-^ Vfc - Vi 



;(i).Rij ^ .Rfej 



fcev 



3.5 Cyclic cover time 

The cyclic cover time is defined as: 



mm < 

I i= i 



E ^(i) / (7e6 « 



11 



with & n the cyclic group of order n. 

The cyclic cover time is an upper bound of the cover time. It represents the average time for a walk 
to visit all vertices in the best deterministic order. (TTJ [H] make use of the cyclic cover time to bound the 
cover time. 

[S] computes the cyclic cover time thanks to a travelling salesperson formulation, which includes a 
prior computation of all the hitting times. Our algorithm can time-quiclky speed up the first phase of 
this computation. 

4 Computation of the Cover Time 

The cover time is the expected time for a random walk starting from a given node to visit all the nodes in 
the graph. In terms of random walk based distributed algorithm, this is the time required to broadcast 
a piece of information to all computers taking part in the process. In the algorithm in [1 that builds 
a spanning tree, the cover time is the average time after which the algorithm has built a spanning tree 
(note that some fault-tolerant algorithms are based on it, the stabilization time is the cover time of the 
graph). 

In this section, we first reformulate the problem in terms of hitting times on a graph Q , then give an 
algorithm providing the cover time. This algorithm is improved in the next subsection and we conclude 
by providing an example of this. 

To compute the cover time, we need a criterion to determine whether every vertex has been visited 
by the token. Consider G = (V, E) the undirected connected graph modeling a distributed system. We 
build from G an associated graph Q so that the cover time of G can be expressed in terms of hitting 
times in Q . To express results on cover time using hitting times, we have to take into account the token 
trajectory. So Q should reflect some history-dependant data. 

In this section, we limit the reasoning to unweighted graph to avoid big equations meaninglcssly. 
Nevertheless, all of them hold with weighted graphs. 

4.1 Construction of the associated graph Q 

First let define Q = (V, £) where V is a set of nodes and £ a set of directed edges. 

• x £ V is defined by x — (P, i) with P £ V(V) where V(V) is the power set of V (set of nodes of G) 
and i £ V. P represents the set of nodes in G already visited by the token, and i £ V the vertex 
on which the token is currently on. 

• any edge (x,y) £ £ is of the form (x, y) = ((P, i) , (Q , j)) with (x,y) eVxV and (i,j) £ E. 

Suppose that, initially, the token is at node i in G, and next the token moves to j neighbor of i, and then 
next moves back to i. In the associated graph Q , we have the following path (({«}, i); ({«, j}, j)', ({«, j}, i))- 
Note that £ is a set of directed edges ((P,i), (Q,j))- Edges in £ are defined by: 



• 



• 



((P, i), (P,j)), where i £ P and j £ P are neighbors; this case corresponds to a token transmission 
to the node j which has already been visited by the token. 

((P, i), (P{J{j},j)) where i £ P and j ^ P arc neighbors; this case corresponds to a token trans- 
mission to the node j which is holding the token for the first time. 



The probability to obtain a given path in G is equal to the probability to obtain the associated path 
in Q. Indeed, for i £ P £ V and j £ V, there exists some Q £ V such that the transition probability from 
(P, i) to (Q, j) and the transition probability from i to j are equal: Q = P if j 6 P, else, Q = P\J{j}. 
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Figure 3: G 




A token in G has visited every node iff the associated token in Q has reached a node (P, i) such that 
P = V. Then, we deduce that the cover time in G is the average time it takes to a token in Q starting 
from a node i to reach any arbitrary node k for the first time while having visited all nodes, that is 

Ci(G) = fe({,},4{(y,i)/tei'}(5) 

The token has covered G when the associated token in Q has hit any vertex in F — {(V, k)/k £ V}. 
We do not care at which node (V, k) the token reaches in Q, then we lump all nodes in F into a single 
node called / (in fact we obtain an absorbing Markov Chain). Now, the cover time in G is obtained by 
the average number of steps needed before entering / starting in node ({i},i)- 



4.2 Cover time computation 

Q being directed, we cannot apply the procedure in section^to compute fym,i),{(y,fc)/fcev r }({?)- 
Let M (x) be the set of vertices that have an incoming edge from x: {y S V/(x,y) € £}. 
Since / can be reached from any vertex (if not, some of the h x j would be undefined) we have, 

f Vx(=V,h xf = l + Y, y eM (x) Pxyhyf 
I h fJ = 



(6) 



The square linear system (Rjl has a single solution (vector hj) then the hitting time between all nodes 
and a given node can be computed by inverting one matrix. 

Thus, the cover time of any graph G is computed by building Q and by computing hu,u\\j(Q), which 
requires the inversion of an approximatively n2 n x n2 n matrix. 

Let G be the graph on figure [3] Then Q is partially represented by the graph on figure [3] 

In figure El we use the following notation: ijk corresponds to node ({i,j,k},k) (e.g. 31 corresponds 
to ({1,3}, 1) and 13 corresponds to ({1,3}, 3)). We only built the part of Q that corresponds to situations 
where the token started in node 1. We did not write the states in which all vertices are visited: for the 
sake of legibility, we circled the sites that lead to such a state. Thus, in state 134, the token will reach 
2 and achieve to cover the graph with probability |, reach 3 (the state being 143) or 1 (341) also with 
probability ^ 

Since we merge all the states in which the token has covered the graph, every circled state leads to 
the new site / with a directed vertex. We did not write unreachable sites. 
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The matrix of Q is then: 
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The system that we have to solve to obtain the cover time is: 
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When solving this system, we obtain that 

13.r.c4 
2 ' °' ° 10 



h JCA — /34, 109. 13. 109. 49, a. 13. r. r 49 . a. 9. 9 
"■■fVd) — \ 5 i 20 ' 2 ' 20 ' 10' ' ° '°' ,J i"'^ 

Thus, d 



34 
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4-4- --0) 

' 4 ' 4' ' ' 2 ' / 
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4.3 Efficient cover time computation 

The matrix to be inverted in the previous method is large (about n1 n x n2 n : we only provide an upper 
bound since the graph size can be reduced by suppressing the unreachable states), leading to a complexity 
approaching n 3 8". 

However, this graph has some particularities that we want to exploit in order to improve the efficiency 
of the computation. The subgraphs constituted by all the vertices (P, i) with the same P are undirected. 
The time it takes to reach / from i when P is the set of already visited vertices in Q can be decomposed 
from the time it takes to reach the first vertex j out of P plus the expected time from (j, P U {j}) to / 
(the expectation being computed over all possible j wrt their probabilities of being the first hitten vertex 
outside of P). Thus, the cover time can be computed according to: 



h( P ,i)j = 1 + s(P, i) + ^2p(P, i,j)h 



(PO{j},j)J 



where 



• p(P,i,j) is the probability that the first vertex outside P hitten by a random walk starting at i is 

i; 

• s(P,i) the average time the walk starting at i stays in P 

1 + s(P, i) is the expected time the walk will spend in the strongly connected component defined by 
P, when it is on i. The next newly visited site is j with probability p(P,i,j), and once on this site, the 
walk will take an expected time of hrp\j{j},j),f to achieve the coverture. 

Thus, the equation above can be decomposed in 1 + s(P,i) which represents the time spent in a 
strongly connected component and ^2jP(P,i,j)htp\jij\ j)j which represents the expected time to reach 
/ in the directed acyclic graph of strongly connected components. 

We can express both of those quantities in terms of equivalent resistances and potentials, making it 
possible to use results from the previous section: for any i in P (i represents the current location of the 
token) and j in V\P (j represents the first site the token will reach outside P) 

• p(P,i,j) is the potential in i when Vj — 1 and all other sites in V\P have potential 
. a(P,i) = h i(v \ P) (G(PU{j})) 

Those quantities can be computed thanks to a \P\ x \P\ matrix inversion. 

Indeed, we have already remarked that the potential on one node c, when a given node a has potential 
and another b has potential 1 is the probability to hit b before a when the current node is c. Thus, the 
potential on i when V 3 ■ = 1 and all other sites in V\P have potential is the probability that the next 
newly visited vertex is j. 

s(P, i) is the average time the token spends in P, since it is the expected time to reach a node in V\P. 

In fig|5] we represented Q and circled subgraphs that are not directed. Each of them is also a subgraph 
(connected and containing 1) of G. We have to compute the time the random walk spends in each of the 
subgraphs, considering its arrival point. In fig [6] we highlighted the directed edges joining the various 
subgraphs: each of them represents the discovery of a new vertex. We have to compute the probability 
that the walk crosses each of these edges, depending on the vertex of the subgraph it arrives on. Then, 
using those information, we can compute the cover time with the above formula. 

The complexity of this procedure is one k x k matrix inversion for each subgraph of size k appearing. 
The complexity is then at most 0(n 3 2 n ). However, it highly depends on the topology of G. If G is a 
chain, only n subgraphs appear (a subgraph occurs in the computation iff it contains the state 1 and is 
connected), and the complexity is 0(n 4 ). 
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Figure 5: G 



Figure 6: Q 



This rather expensive computation is robust to the topological evolution of the graph, thanks to the 
Raylcigh's shortcut principle exposed above. The cover time computation being based on a hitting time 
computation, adding or removing a limited number of edges in the graph do not modify the cover time 
by more than the ratio of the weights of modified edges to the global weight of the graph. 

5 Conclusion 

Random walk based algorithms represent an important class of distributed algorithms, two of their main 
features are that they require no assumptions on the topology of the network and that they can easily 
handle topological changes without any special procedure triggered by a change. The exact computation 
of hitting and cover times allows the computation of the complexity of these algorithms. 

Further research can be conducted based on the exact computation of the hitting and cover times. 
These results are more clearer than previous results which were approximation. We plan to overview the 
hitting and cover times over various topologies, ranging from classical topologies, like hypercubes or tori, 
to topologies modeling the actual high-scale distributed systems, like small-world graphs, some categories 
of random graphs and maps of parts of peer-to-peer file-sharing networks. We hope this work will provide 
tracks on the topologies to consider in order to achieve a good behavior of the walk and on the impact 
of a slight difference between the actual topology of a network and the intended topology. 

Thus, the hitting and cover times allow us determine the complexity of a wide class of algorithms, 
but we can also improve them by choosing the topologies in which they are efficient. 
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Appendices 

The Cover Time of the Complete Graph 

Let G = (V, E) be the complete graph on n vertices. Let C be the average time a random walk on G 
takes to visit every vertex in G (the cover time; note that the starting vertex does not matter here, since 
the graph is symmetric) . Let C^ be the average time the random walk takes to visit the k + 1-st vertex 
when it has visited the fc-th one. 
Then, 

n-1 

fc=i 



17 



When the walk has visited k vertices, at the next step, it has ^ydiance to visit a new vertex, and 
^j chance to visit an already known one. Thus, the expected time to visit a new vertex, when k vertices 
have already been visited is: 



a 



k — 1 \ n — k v-—N v—v f k — 1 \ n — k 



k 



El K — 1 \ II — K ^— \ ^— v 

1 Vn-lJ n- 1 ~ ^^ 



. n — 1 / n — 1 
fc — 1 V In — fc v-^ / fc — 1 x ' 



2-^1 \ 2-t \ n - 1 / I n. — 1 ^ 



, n — 1 J I n — 1 ^-^ V R — 1 / 1 — - — \ n — 1 

j'eN* \i>j x ' I jeN* x x n— 1 



2 



n — k ( n — 1\ n — k 



\ 1 — - — i / n — 1 \n — k J n — 1 

\ 71 — 1 / x 

n — 1 



Then: 



71—1 71—1 -. Tl—1 -. n— 1 _. 

fe=l fe=l fc=l 7 = 1 



= (n-l)#„_i 

with iJ„ the n-th harmonic number. 

Thus 

C — „^oo nlogn 
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